A Stochastic Parser Based on an SLM with Arboreal Context Trees
نویسنده
چکیده
In this paper, we present a parser based on a stochastic structured language model (SLM) with a exible history reference mechanism. An SLM is an alternative to an n-gram model as a language model for a speech recognizer. The advantage of an SLM against an n-gram model is the ability to return the structure of a given sentence. Thus SLMs are expected to play an important part in spoken language understanding systems. The current SLMs refer to a xed part of the history for prediction just like an n-gram model. We introduce a exible history reference mechanism called an ACT (arboreal context tree; an extension of the context tree to tree-shaped histories) and describe a parser based on an SLM with ACTs. In the experiment, we built an SLM-based parser with a xed history and one with ACTs, and compared their parsing accuracies. The accuracy of our parser was 92.8%, which was higher than that for the parser with the xed history (89.8%). This result shows that the exible history reference mechanism improves the parsing ability of an SLM, which has great importance for language understanding.
منابع مشابه
A Structured Language Model Based on Context-Sensitive Probabilistic Left-Corner Parsing
Recent contributions to statistical language modeling for speech recognition have shown that probabilistically parsing a partial word sequence aids the prediction of the next word, leading to “structured” language models that have the potential to outperform n-grams. Existing approaches to structured language modeling construct nodes in the partial parse tree after all of the underlying words h...
متن کاملTree-grammar linear typing for unified super-tagging/probabilistic parsing models
We integrate super-tagging, guided-parsing and probabilistic parsing in the framework of an item-based LTAG chart parser. Items are based on a linear-typing of trees that encodes their expanding path, starting from their anchor.
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملar X iv : c s . C L / 0 10 80 23 v 1 2 9 A ug 2 00 1 Information Extraction Using the Structured Language Model
The paper presents a data-driven approach to information extraction (viewed as template filling) using the structured language model (SLM) as a statistical parser. The task of template filling is cast as constrained parsing using the SLM. The model is automatically trained from a set of sentences annotated with frame/slot labels and spans. Training proceeds in stages: first a constrained syntac...
متن کاملEffective Constituent Projection across Languages
We describe an effective constituent projection strategy, where constituent projection is performed on the basis of dependency projection. Especially, a novel measurement is proposed to evaluate the candidate projected constituents for a target language sentence, and a PCFG-style parsing procedure is then used to search for the most probable projected constituent tree. Experiments show that, th...
متن کامل